Investigating sentence weighting components for automatic summarisation

نویسندگان

Shao Fen Liang

Siobhan Devlin

John Tait

چکیده

The work described here initially formed part of a triangulation exercise to establish the effectiveness of the Query Term Order algorithm. The methodology produced subsequently proved to be a reliable indicator of quality for summarising English web documents. We utilised the human summaries from the Document Understanding Conference data, and generated queries automatically for testing the QTO algorithm. Six sentence weighting schemes that made use of Query Term Frequency and QTO were constructed to produce system summaries, and this paper explains the process of combining and balancing the weighting components. We also examined the five automatically generated query terms in their different permutations to check if the automatic generation of query terms resulting bias. The summaries produced were evaluated by the ROUGE-1 metric, and the results showed that using QTO in a weighting combination resulted in the best performance. We also found that using a combination of more weighting components always produced improved performance compared to any single weighting component.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The HOLJ Corpus: Supporting Summarisation Of Legal Texts

We describe an XML-encoded corpus of texts in the legal domain which was gathered for an automatic summarisation project. We describe two distinct layers of annotation: manual annotation of the rhetorical status of sentences and an entirely automatic annotation process incorporating a host of individual linguistic processors. The manual rhetorical status annotation has been developed as trainin...

متن کامل

An Approach for Query-Focused Text Summarisation for Evidence Based Medicine

We present an approach for extractive, query-focused, singledocument summarisation of medical text. Our approach utilises a combination of target-sentence-specific and target-sentence-independent statistics derived from a corpus specialised for summarisation in the medical domain. We incorporate domain knowledge via the application of multiple domain-specific features, and we customise the answ...

متن کامل

Summarising text with a genetic algorithm-based sentence extraction

Automatic text summarisation has long been studied and used. The growth in the amount of information on the web results in more demands for automatic methods for text summarisation. Designing a system to produce human-quality summaries is difficult and therefore, many researchers have focused on sentence or paragraph extraction, which is a kind of summarisation. In this paper, we introduce a ne...

متن کامل

The influence of personal pronouns for automatic summarisation of scientific articles

In automatic summarisation, statistical methods based on tokens’ frequency are commonly used in combination with other methods or on their own to extract important sentences from a text. Quite often researchers justify the relatively poor performance of these statistical methods by the fact that they do not consider the anaphoric relations between words. In this paper, we perform a comprehensiv...

متن کامل

Opinion-aware information management : statistical summarisation and knowledge representation of opinions

Nowadays, an increasing amount of media platforms provide the users with opportunities for sharing their opinions about products, companies or people. In order to support users accessing opinion-based information, and to support engineers building systems that require opinionaware reasoning, intelligent opinion-aware tools and techniques are needed. This thesis contributes methods and technolog...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Inf. Process. Manage.

دوره 43 شماره

صفحات -

تاریخ انتشار 2007

Investigating sentence weighting components for automatic summarisation

نویسندگان

چکیده

منابع مشابه

The HOLJ Corpus: Supporting Summarisation Of Legal Texts

An Approach for Query-Focused Text Summarisation for Evidence Based Medicine

Summarising text with a genetic algorithm-based sentence extraction

The influence of personal pronouns for automatic summarisation of scientific articles

Opinion-aware information management : statistical summarisation and knowledge representation of opinions

عنوان ژورنال:

اشتراک گذاری